AITopics | training budget

Collaborating Authors

training budget

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Speedy Performance Estimation for Neural Architecture Search

Neural Information Processing SystemsApr-25-2026, 01:53:53 GMT

Reliable yet efficient evaluation of generalisation performance of a proposed architecture is crucial to the success of neural architecture search (NAS). Traditional approaches face a variety of limitations: training each architecture to completion is prohibitively expensive, early stopped validation accuracy may correlate poorly with fully trained performance, and model-based estimators require large training sets. We instead propose to estimate the final test performance based on a simple measure of training speed. Our estimator is theoretically motivated by the connection between generalisation and training speed, and is also inspired by the reformulation of a PAC-Bayes bound under the Bayesian setting. Our modelfree estimator is simple, efficient, and cheap to implement, and does not require hyperparameter-tuning or surrogate training before deployment. We demonstrate on various NAS search spaces that our estimator consistently outperforms other alternatives in achieving better correlation with the true test performance rankings. We further show that our estimator can be easily incorporated into both query-based and one-shot NAS methods to improve the speed or quality of the search.

artificial intelligence, bayesian inference, machine learning, (17 more...)

Neural Information Processing Systems

Country: Europe > United Kingdom > England (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)

Add feedback

Accelerating Augmentation Invariance Pretraining Jinhong Lin Cheng-En Wu

Neural Information Processing SystemsFeb-17-2026, 08:59:45 GMT

In ImageNet, our method achieves speedups of 4 in MoCo, 3.3 in SimCLR, and 2.5 in DINO, demonstrating substantial efficiency gains.

acceleration strategy, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

SpeedyPerformanceEstimationforNeural ArchitectureSearch

Neural Information Processing SystemsFeb-7-2026, 19:53:42 GMT

We instead propose to estimate the final test performance based on a simple measure of training speed.

architecture, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Stepsize anything: A unified learning rate schedule for budgeted-iteration training

Tang, Anda, Dong, Yiming, Zeng, Yutao, Xun, zhou, Lin, Zhouchen

arXiv.org Artificial IntelligenceDec-9-2025

The expanding computational costs and limited resources underscore the critical need for budgeted-iteration training, which aims to achieve optimal learning within predetermined iteration budgets. While learning rate schedules fundamentally govern the performance of different networks and tasks, particularly in budgeted-iteration scenarios, their design remains largely heuristic, lacking theoretical foundations. In addition, the optimal learning rate schedule requires extensive trial-and-error selection, making the training process inefficient. In this work, we propose the Unified Budget-Aware (UBA) schedule, a theoretically grounded learning rate schedule that consistently outperforms commonly-used schedules among diverse architectures and tasks under different constrained training budgets. First, we bridge the gap by constructing a novel training budget-aware optimization framework, which explicitly accounts for the robustness to landscape curvature variations. From this framework, we derive the UBA schedule, controlled by a single hyper-parameter φthat provides a trade-off between flexibility and simplicity, eliminating the need for per-network numerical optimization. Moreover, we establish a theoretical connection between φand the condition number, adding interpretation and justification to our approach. Besides, we prove the convergence for different values of φ. We offer practical guidelines for its selection via theoretical analysis and empirical results. Extensive experimental results show that UBA consistently surpasses the commonly-used schedules across diverse vision and language tasks, spanning network architectures (e.g., ResNet, OLMo) and scales, under different training-iteration budgets.

artificial intelligence, machine learning, rate schedule, (16 more...)

arXiv.org Artificial Intelligence

2505.24452

Country: Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Adaptive Discretization for Consistency Models

Bai, Jiayu, Feng, Zhanbo, Deng, Zhijie, Hou, Tianqi, Qiu, Robert C., Ling, Zenan

arXiv.org Machine LearningOct-21-2025

Consistency Models (CMs) have shown promise for efficient one-step generation. However, most existing CMs rely on manually designed discretization schemes, which can cause repeated adjustments for different noise schedules and datasets. To address this, we propose a unified framework for the automatic and adaptive discretization of CMs, formulating it as an optimization problem with respect to the discretization step. Concretely, during the consistency training process, we propose using local consistency as the optimization objective to ensure trainability by avoiding excessive discretization, and taking global consistency as a constraint to ensure stability by controlling the denoising error in the training target. We establish the trade-off between local and global consistency with a Lagrange multiplier. Building on this framework, we achieve adaptive discretization for CMs using the Gauss-Newton method. We refer to our approach as ADCMs. Experiments demonstrate that ADCMs significantly improve the training efficiency of CMs, achieving superior generative performance with minimal training overhead on both CIFAR-10 and ImageNet. Moreover, ADCMs exhibit strong adaptability to more advanced DM variants. Code is available at https://github.com/rainstonee/ADCM.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2510.17266

Country: Asia > China (0.46)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Accelerating Augmentation Invariance Pretraining Jinhong Lin Cheng-En Wu

Neural Information Processing SystemsOct-10-2025, 12:55:04 GMT

In ImageNet, our method achieves speedups of 4 in MoCo, 3.3 in SimCLR, and 2.5 in DINO, demonstrating substantial efficiency gains.

acceleration strategy, representation, training budget, (13 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin > Dane County > Madison (0.04)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

To all reviewers, thank you very much for your thoughtful comments and suggestions

Neural Information Processing SystemsOct-3-2025, 09:16:35 GMT

To all reviewers, thank you very much for your thoughtful comments and suggestions. R#1: "...importance of similarity among the selected tasks... " R#1: "...domain randomization, when enough samples are used, is a better alternative to meta-learning... " R#2: "...Theorems 1 and 2 are asymptotic... " Hence, the theorems are NOT asymptotic. We will remove the asymptotic parts for clarity. R#2: 'Assumption 2 ... the per-task optimal models are centered around the corresponding optimal solutions. This assumption can easily be dropped with the cost of including the distance as a term.

maml, thoughtful comment and suggestion, trade-off, (14 more...)

Neural Information Processing Systems

Genre: Research Report (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.95)

Add feedback

Meek Models Shall Inherit the Earth

Gundlach, Hans, Lynch, Jayson, Thompson, Neil

arXiv.org Artificial IntelligenceJul-11-2025

The past decade has seen incredible scaling of AI systems by a few companies, leading to inequality in AI model performance. This paper argues that, contrary to prevailing intuition, the diminishing returns to compute scaling will lead to a convergence of AI model capabilities. In other words, meek models (those with limited computation budget) shall inherit the earth, approaching the performance level of the best models overall. We develop a model illustrating that under a fixed-distribution next-token objective, the marginal capability returns to raw compute shrink substantially. Given current scaling practices, we argue that these diminishing returns are strong enough that even companies that can scale their models exponentially faster than other organizations will eventually have little advantage in capabilities. As part of our argument, we give several reasons that proxies like training loss differences capture important capability measures using evidence from benchmark data and theoretical performance models. In addition, we analyze empirical data on the capability difference of AI models over time. Finally, in light of the increasing ability of meek models, we argue that AI strategy and policy require reexamination, and we outline the areas this shift will affect.

compute, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2507.07931

Country: North America > United States (0.46)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Combatting Dimensional Collapse in LLM Pre-Training Data via Diversified File Selection

Fan, Ziqing, Du, Siyuan, Hu, Shengchao, Wang, Pingjie, Shen, Li, Zhang, Ya, Tao, Dacheng, Wang, Yanfeng

arXiv.org Artificial IntelligenceApr-30-2025

Selecting high-quality pre-training data for large language models (LLMs) is crucial for enhancing their overall performance under limited computation budget, improving both training and sample efficiency. Recent advancements in file selection primarily rely on using an existing or trained proxy model to assess the similarity of samples to a target domain, such as high quality sources BookCorpus and Wikipedia. However, upon revisiting these methods, the domain-similarity selection criteria demonstrates a diversity dilemma, i.e.dimensional collapse in the feature space, improving performance on the domain-related tasks but causing severe degradation on generic performance. To prevent collapse and enhance diversity, we propose a DiverSified File selection algorithm (DiSF), which selects the most decorrelated text files in the feature space. We approach this with a classical greedy algorithm to achieve more uniform eigenvalues in the feature covariance matrix of the selected texts, analyzing its approximation to the optimal solution under a formulation of $γ$-weakly submodular optimization problem. Empirically, we establish a benchmark and conduct extensive experiments on the TinyLlama architecture with models from 120M to 1.1B parameters. Evaluating across nine tasks from the Harness framework, DiSF demonstrates a significant improvement on overall performance. Specifically, DiSF saves 98.5% of 590M training files in SlimPajama, outperforming the full-data pre-training within a 50B training budget, and achieving about 1.5x training efficiency and 5x data efficiency.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2504.20644

Country: Asia > China (0.28)

Genre: Research Report (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Filters

Collaborating Authors

training budget

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Speedy Performance Estimation for Neural Architecture Search

1579d5d8edacd85ac1a86aea28bdf32d-Paper-Conference.pdf

Accelerating Augmentation Invariance Pretraining Jinhong Lin Cheng-En Wu

SpeedyPerformanceEstimationforNeural ArchitectureSearch

Stepsize anything: A unified learning rate schedule for budgeted-iteration training

Adaptive Discretization for Consistency Models

Accelerating Augmentation Invariance Pretraining Jinhong Lin Cheng-En Wu

To all reviewers, thank you very much for your thoughtful comments and suggestions

Meek Models Shall Inherit the Earth

Combatting Dimensional Collapse in LLM Pre-Training Data via Diversified File Selection